70 research outputs found

    Rječnik suvremenoga slovenskog jezika: od slovenske leksičke baze do digitalne rječničke baze

    Get PDF
    The ability to process language data has become fundamental to the development of technologies in various areas of human life in the digital world. The development of digitally readable linguistic resources, methods, and tools is, therefore, also a key challenge for the contemporary Slovene language. This challenge has been recognized in the Slovene language community both at the professional and state level and has been the subject of many activities over the past ten years, which will be presented in this paper. The idea of a comprehensive dictionary database covering all levels of linguistic description in modern Slovene, from the morphological and lexical levels to the syntactic level, has already formulated within the framework of the European Social Fundā€™s Communication in Slovene (2008-2013) project; the Slovene Lexical Database was also created within the framework of this project. Two goals were pursued in designing the Slovene Lexical Database (SLD): creating linguistic descriptions of Slovene intended for human users that would also be useful for the machine processing of Slovene. Ever since the construction of the first Slovene corpus, it has become evident that there is a need for a description of modern Slovene based on real language data, and that it is necessary to understand the needs of language users to create useful language reference works. It also became apparent that only the digital medium enables the comprehensiveness of language description and that the design of the database must be adapted to it from the start. Also, the description must follow best practices as closely as possible in terms of formats and international standards, as this enables the inclusion of Slovene into a wider network of resources, such as Open Linked Data, babelNet and ELExIS. Due to time pressures and trends in lexicography, procedures to automate the extraction of linguistic data from corpora and the inclusion of crowdsourcing into the lexicographic process were taken into consideration. Following the essential idea of creating an all-inclusive digital dictionary database for Slovene, a few independent databases have been created over the past two years: the Collocations Dictionary of Modern Slovene, and the automatically generated Thesaurus of Modern Slovene, both of which also exist as independent online dictionary portals. One of the novelties that we put forward together with both dictionaries is the ā€˜responsive dictionaryā€™ concept, which includes crowdsourcing methods. Ultimately, the Digital Dictionary Database provides all (other) levels of linguistic description: the morphological level with the Sloleks database upgrade, the phraseological level with the construction of a multi-word expressions lexicon, and the syntactic level with the formalization of Slovene verb valency patterns. Each of these databases contains its specific language data that will ultimately be included in the comprehensive Slovene Digital Dictionary Database, which will represent basic linguistic descriptions of Slovene both for the human and machine user.Ideja sveobuhvatne rječničke baze koja uključuje sve razine jezičnoga opisa suvremenoga slovenskog jezika od morfoloÅ”ke i leksičke do sintaktičke prvotno je formulirana u okviru projekta Sporazumijevanje na slovenskomu jeziku (2008. ā€“ 2013.). U cilju ostvarenja ideje o stvaranju sveobuhvatne digitalne rječničke baze stvorene su dvije neovisne baze podataka: Kolokacijski rječnik suvremenoga slovenskoga jezika i automatski generiran Tezaurus modernoga slovenskoga jezika. Jedna od novina u obama rječnicima koncept je responzivnoga rječnika, koji uključuje masovnu podrÅ”ku. Digitalna rječnička baza sadržava sve razine jezičnoga opisa: morfoloÅ”ku nadograđenu Sloleksom, izraznu s opisom konstrukcija viÅ”erječnih jedinica te sintaktičku s formalizacijom modela glagolskih valencija. Svaka od postojećih baza podataka sadržava specifične jezične podatke koji će biti uključeni u sveobuhvatnu Slovensku digitalnu rječničku bazu podataka, koja će sadržavati temeljni jezikoslovni opis slovenskoga jezika čiji korisnici mogu biti ljudi i strojevi

    Leksikalna baza za slovenŔčino: komu, zakaj in kako (naprej)?

    Get PDF
    This article describes the guidelines in the formation of the Slovenian lexical database, especially the issue of various users and the types and manners of structuring lexical and grammatical information in this database. Special emphasis is placed on questions dealing with the scope and selection of lexical units and the arrangement of lexical and grammatical information, while taking into account the premise that information in the lexical database is primarily intended for web applications and modern electronic media.V prispevku so opisane smernice pri oblikovanju leksikalne baze za slovenŔčino, zlasti vpraŔanje različnih uporabnikov ter vrste in načina strukturiranja leksikalno-slovničnih podatkov v njej. Posebej so izpostavljene dileme, ki zadevajo določitev obsega in izbora leksikalnih enot ter razporeditev leksikalno-slovničnih podatkov ob upoŔtevanju predpostavke, da bodo podatki v leksikalni bazi za slovenŔčino namenjeni primarno spletnim aplikacijam in sodobnim elektronskim medijem

    Slovar sodobnega slovenskega jezika: leksikografska tradicija in/ali inovacija

    Get PDF
    Ko je bil konec maja 2013 objavljen Predlog za izdelavo Slovarja sodobnega slovenskega jezika, se je tako na strokovnih forumih kot v medijih razvila debata o tem, ali naj novi slovar slovenskega jezika sledi leksikografski tradiciji, kot se je oblikovala s Slovarjem slovenskega knjižnega jezika, ali naj se od te tradicije oddalji. Ker so se ob tem oblikovali različni pogledi na razumevanje slovarske tradicije kot tudi na vključevanje sodobnih slovarskih praks, želimo v prispevku na podlagi analize zasnove SSKJ in SNB ter s prispevki, ki se kakorkoli nanaŔajo na koncept bodočega slovarja slovenskega jezika, ugotoviti, katere elemente leksikografske teorije in prakse lahko pojmujemo kot tradicionalne ter katere so predlagane novosti v slovenski leksikografiji. Vzporedno predlagamo tudi zasnovo novega slovarja v ključnih segmentih, tj. z vidika uporabnika, medija in uporabe jezikovnotehnoloŔkega znanja, ki bi zadostila opisu sodobnega slovenskega jezika, ki kar v največji meri zadovoljuje potrebe jezikovne skupnosti v danaŔnjem času in okoliŔčinah

    Temeljne prvine zasnove frazeoloŔkega slovarja

    Get PDF
    Z analitično-sintetično metodo primerjanja slovarskih reŔitev v frazeoloŔkih slovarjih je mogoče izločiti prvine slovarske zasnove, ki jih predvideva celovit slovarski opis frazemske enote. Specifičnost zasnove frazeoloŔkega slovarja, kot je prikazana v članku, upoŔteva povezanost frazeoloŔkega in frazeografskega sistema. Pregled prvin slovarskih zasnov znotraj posameznih segmentov slovarskega opisa nakazuje možne reŔitve tudi za frazeoloŔki slovar slovenskega jezika

    Stalne besedne zveze v slovenŔčini

    Get PDF
    Osrednji predmet opazovanja v knjigi Stalne besedne zveze v slovenŔčini ā€“ korpusni pristop so leksikalne enote, ki so praviloma sestavljene iz več kot ene besede, poseben poudarek pa je namenjen njihovi umestitvi v sodobni slovenski leksikalni fond na podlagi empirične analize jezikovnih podatkov, pridobljenih iz slovenskih referenčnih elektronskih besedilnih korpusov FIDA in FidaPLUS. Omenjeni pristop opazuje jezik izključno na podlagi realnih besedil, ki tvorijo diskurzni univerzum, in so zajeta v konkretni besedilni korpus. Pristop k leksikalnemu opisu slovenskega jezika na tej podlagi ponuja v slovenističnem jezikoslovju novo opazovalno izhodiŔče tako glede kakovosti in količine jezikovnih podatkov kot tudi glede metodologije jezikoslovne analize. Bistvena posledica takega pristopa se kaže v brisanju mej med eno- in večbesednimi leksikalnimi enotami ter v razÅ”iritvi frazeoloÅ”ke problematike ne samo na raven leksikologije pač pa tudi skladnje in besediloslovja

    Editorial

    Get PDF

    Uvodnik

    Get PDF
    S prvo Ŕtevilko drugega letnika revija SlovenŔčina 2.0: empirične, aplikativne in interdisciplinarne raziskave, ki jo tisti, ki nam je že domača, na kratko imenujemo SLO 2.0, utrjuje svojo osrednjo vlogo na področju prikaza rezultatov raziskav slovenskega in drugih jezikov, ki združujejo empirični ter interdisciplinarni, zlasti pa jezikovnotehnoloŔki pristop in aplikativno naravnanost. Z izidom Ŕtevilke 1 (2014) pa v slovenistično znanstveno periodiko prinaŔamo Ŕe eno novost: sprotno objavljanje

    Uvodnik

    Get PDF
    Digitalizirani jezikovni viri, procesiranje naravnega jezika, korpusne analizeĀ slovničnih in drugih jezikovnih pojavov, rudarjenje besedil, označevalniki,Ā luŔčilniki, leksikografska orodja, sinteza govora, strojno prevajanje, avatarskiĀ sogovorci, pametne hiÅ”e ... Skupna točka: jezik

    Uvodnik

    Get PDF

    Defining collocation for Slovenian lexical resources

    Get PDF
    In this paper, we define the notion of collocation for the purpose of its use in machine-readable language resources, which will be used in the creation of electronic dictionaries and language applications for Slovene. Based on theoretical and lexicographically-driven studies we define collocation as a lexical phenomenon, defined by three key aspects: statistical, syntactic, and semantic. We take lexicographic relevance as a point of departure for defining collocations within the typology of word combinations, as well as for distinguishing them from free combinations. Free combinations are (frequent) syntactically valid word combinations without lexicographic value and consequently there is no need for the description of their meaning, or syntactic role. Next, we distinguish collocations from all multiword lexical units (compounds, phraseological units and lexico-grammatical units) using the lexicographic view that multiword lexical units, whose meaning is not a sum of its parts, require a description of their meaning whereas collocations do not. In the final part, we return to the three aspects of collocation and their role in automatic extraction of collocational information from corpora. Semantic criterion or dictionary relevance of extracted collocations has particularly exposed the problem of semantically broad collocates such as certain types of adverbs, adjectives and verbs, and word which feature in different syntactic roles (e.g. pronouns and adjuncts). We discuss a particular issue of collocations related to proper names and the decisions about their inclusion into the dictionary based on the evaluation of lexicographers
    • ā€¦
    corecore